Database Partitioning Strategies for Social Network Data
نویسندگان
چکیده
In this thesis, I designed, prototyped and benchmarked two different data partitioning strategies for social network type workloads. The first strategy takes advantage of the heavy-tailed degree distributions of social networks to optimize the latency of vertex neighborhood queries. The second strategy takes advantage of the high temporal locality of workloads to improve latencies for vertex neighborhood intersection queries. Both techniques aim to shorten the tail of the latency distribution, while avoiding decreased write performance or reduced system throughput when compared to the default hash partitioning approach. The strategies presented were evaluated using synthetic workloads of my own design as well as real workloads provided by Twitter, and show promising improvements in latency at some cost in system complexity. Thesis Supervisor: Stu Hood Title: Engineer at Twitter Thesis Supervisor: Samuel R. Madden Title: Associate Professor
منابع مشابه
Hermes: Dynamic Partitioning for Distributed Social Network Graph Databases
Social networks are large graphs that require multiple graph database servers to store and manage them. Each database server hosts a graph partition with the objectives of balancing server loads, reducing remote traversals (edge-cuts), and adapting the partitioning to changes in the structure of the graph in the face of changing workloads. To achieve these objectives, a dynamic repartitioning a...
متن کاملPartitioning Graph Databases - A Quantitative Evaluation
The amount of globally stored, electronic data is growing at an increasing rate. This growth is both in size and connectivity, where connectivity refers to the increasing presence of, and interest in, relationships between data [12]. An example of such data is the social network graph created and stored by Twitter [2]. Due to this growth, demand is increasing for technologies that can process s...
متن کاملDesigning a trust-based recommender system in Social Rating Networks
One of the most common styles of business today is electronic business, since it is considered as a principal mean for financial transactions among advanced countries. In view of the fact that due to the evolution of human knowledge and the increase of expectations following that, traditional marketing in electronic business cannot meet current generation’s needs, in order to survive, organizat...
متن کاملAn Effective Method for Utility Preserving Social Network Graph Anonymization Based on Mathematical Modeling
In recent years, privacy concerns about social network graph data publishing has increased due to the widespread use of such data for research purposes. This paper addresses the problem of identity disclosure risk of a node assuming that the adversary identifies one of its immediate neighbors in the published data. The related anonymity level of a graph is formulated and a mathematical model is...
متن کاملSchism: a Workload-Driven Approach to Database Replication and Partitioning
We present Schism, a novel workload-aware approach for database partitioning and replication designed to improve scalability of sharednothing distributed databases. Because distributed transactions are expensive in OLTP settings (a fact we demonstrate through a series of experiments), our partitioner attempts to minimize the number of distributed transactions, while producing balanced partition...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012